Search CORE

2 research outputs found

Towards Reliable Dermatology Evaluation Benchmarks

Author: Consortium Labelling
Daneshjou Roxana
Gonzalez-Jimenez Alvaro
Gottfrois Philippe
Groh Matthew
Gröger Fabian
Lionetti Simone
Navarini Alexander A.
Pouly Marc
Publication venue
Publication date: 13/09/2023
Field of study

Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the International Skin Imaging Collaboration. Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. Our work paves the way for more trustworthy performance assessment in digital dermatology.Comment: Link to the revised file lists: https://github.com/Digital-Dermatology/SelfClean-Revised-Benchmark

arXiv.org e-Print Archive

SelfClean: A Self-Supervised Data Cleaning Strategy

Author: Amruthalingam Ludovic
Consortium Labelling
Gonzalez-Jimenez Alvaro
Gottfrois Philippe
Groh Matthew
Gröger Fabian
Lionetti Simone
Navarini Alexander A.
Pouly Marc
Publication venue
Publication date: 29/09/2023
Field of study

Most benchmark datasets for computer vision contain irrelevant images, near duplicates, and label errors. Consequently, model performance on these benchmarks may not be an accurate estimate of generalization capabilities. This is a particularly acute concern in computer vision for medicine where datasets are typically small, stakes are high, and annotation processes are expensive and error-prone. In this paper we propose SelfClean, a general procedure to clean up image datasets exploiting a latent space learned with self-supervision. By relying on self-supervised learning, our approach focuses on intrinsic properties of the data and avoids annotation biases. We formulate dataset cleaning as either a set of ranking problems, which significantly reduce human annotation effort, or a set of scoring problems, which enable fully automated decisions based on score distributions. We demonstrate that SelfClean achieves state-of-the-art performance in detecting irrelevant images, near duplicates, and label errors within popular computer vision benchmarks, retrieving both injected synthetic noise and natural contamination. In addition, we apply our method to multiple image datasets and confirm an improvement in evaluation reliability

arXiv.org e-Print Archive